ggplot2: Basics

ggplot2: Das Paket


ggplot2 gehört zum tidyverse

#install.packages("tidyverse")
library(tidyverse)


… kann aber natürlich auch seperat geladen werden:

#install.packages("ggplot2")
library(ggplot2)

The Big picture

Start: ggplot()

ggplot()

Komponenten

  1. Daten.
  2. Aesthetic mapping zwischen Daten und visuellen Eigenschaften.
  3. [Layer(s)] zum rendern der Daten.

Daten vorstellen

library(jsonlite)
library(tidyverse)

get_gapminder <- function(repo = "https://github.com/open-numbers/ddf--gapminder--fasttrack/archive/refs/heads/master.zip", 
                          keywords = "co2"){
  download.file(url = repo, destfile = "gapminder_fasttrack_master.zip")
unzip(zipfile = "gapminder_fasttrack_master.zip", exdir = "data")
gapminder_path <- "data/ddf--gapminder--fasttrack-master/"

json_data <- jsonlite::fromJSON(here::here(gapminder_path, "datapackage.json"))

if (file.exists("gapminder_fasttrack_master.zip")) {
  #Delete file if it exists
  file.remove("gapminder_fasttrack_master.zip")
}


csv_paths <- json_data$resources$path
matched_paths <- csv_paths[str_detect(csv_paths, str_c(keywords, collapse = "|"))]
matched_paths <- paste0(gapminder_path, matched_paths)


if (length(matched_paths) == 0) {
  stop("No files matched the specified keywords.")
}


merged_df <- read_csv(matched_paths[1])

# Loop through and merge the rest
if (length(matched_paths) > 1) {
  for (i in 2:length(matched_paths)) {
    message("Reading file: ", matched_paths[i])
    temp_df <- read_csv(matched_paths[i])
    
    merged_df <- full_join(merged_df, temp_df)
    rm(temp_df)
    gc()
  }
}



# Create timestamp string: e.g., "2025-04-08_14-30-15"
timestamp <- format(Sys.time(), "%Y-%m-%d_%H-%M-%S")

# Build filename with path
filename <- paste0("./data/gapminder_set_", timestamp, ".RDS")

# Save RDS
saveRDS(merged_df, filename)

if (dir.exists(gapminder_path)) {
  unlink(gapminder_path, recursive = TRUE)
}

return(merged_df)
}


co2_gapminder <- get_gapminder(keywords = c("population", "co2"))
pop_world <- read.csv(here::here("raw_data", "pop.csv"))
co2_world <- read.csv(here::here("raw_data", "co2_pcap_cons.csv"))

colnames(co2_world) <- gsub("^X", "", colnames(co2_world)) 
co2_world[, 2:ncol(co2_world)] <- co2_world[, 2:ncol(co2_world)] %>% 
  mutate(across(everything(), ~ gsub("−", "-", as.character(.)))) %>% 
mutate_if(is.character, as.numeric) 

co2_world <- co2_world %>% 
  pivot_longer(cols = -country, 
               names_to = "year", 
               values_to = "co2")

Daten

ggplot(data = movies_metadat)

Aesthetic mapping

Um diese leere Leinwand zu befüllen, müssen wir die Daten mit den benötigten visuellen Eigenschaften verknüpfen:

mapping = aes()

Je nach Plot-Art sind verschiedene visuelle Eigenschaften möglich. Wichtig ist für uns jetzt erst einmal die Position, also x - und y-Achsen.
Es kann hier aber z.B. auch die Farbe der Punkte in Agnhängikeit von Kategorien in den Daten geändert werden.

Aesthetic mapping: Achsen

ggplot(data = movies_metadat, 
       mapping = aes(x = budget, 
                     y = vote_average))

Geometric Layers

ggplots sind aus verschiedenen Layern aufgebaut, die mithilfe eines + übereinander gelegt werden.

geom_

Layers

ggplot(data = movies_metadat, 
       mapping = aes(x = budget, 
                     y = vote_average)) +
  geom_point()

Mehr Layers!

ggplot(data = movies_metadat, 
       mapping = aes(x = budget, 
                     y = vote_average)) +
  geom_point() +
  geom_smooth()

Titel/Labels

ggplot(data = movies_metadat, 
       mapping = aes(x = budget, 
                     y = vote_average)) +
  geom_point() +
  geom_smooth() +
  labs(
    title = "Getting a bang for your buck: Are Movies with higher budget also better?",
    subtitle = "There doesn't seem to be a strong relation between movie budget and average rating.",
    x = "Movie budget",
    y = "Average vote"
  )

Style deinen Plot: Themes

ggplot(data = movies_metadat, 
       mapping = aes(x = budget, 
                     y = vote_average)) +
  geom_point() +
  geom_smooth() +
  labs(
    title = "Getting a bang for your buck: Are Movies with higher budget also better?",
    subtitle = "There doesn't seem to be a strong relation between movie budget and average rating.",
    x = "Movie budget",
    y = "Average vote"
  ) +
  theme_classic()

Übung

Let’s take a deeper dive

Hier dann nochmal genauer durchgehen - Was haben wir eigentlich gemacht. Nicht zu sehr in den Basics verlieren, auch schneller tiefer reingehen (scales, coord system …)

Abspeichern

Farben

https://questionsindataviz.com/2023/12/29/what-makes-a-truly-terrible-map/